Yale  University 

Department  of  Computer  Science 


This  work  was  supported  by  the  DoD  University  Research  Initiative  (URI)  administered 
by  the  Office  of  Naval  Research  under  Grant  N00014-01-1-0795. 

1Department  of  Computer  Science,  Yale  University,  New  Haven,  CT  06520-8285,  USA.  Email: 
aspnesOcs .  yale .  edu .  Supported  in  part  by  NSF. 

2Department  of  Computer  Science,  Yale  University,  New  Haven,  CT  06520-8285,  USA.  Email: 
joan.feigenbaum@yale.edu.  Supported  in  part  by  NSF  and  ONR. 

3Department  of  Computer  Science,  Yale  University,  New  Haven,  CT  06520-8285,  USA.  Email: 
aleksandr .  yampolskiy@yale  .  edu.  Supported  by  NSF. 

4Department  of  Computer  Science,  Yale  University,  New  Haven,  CT  06520-8285,  USA.  Email: 
sheng.zh.ong@yale.edu.  Supported  by  NSF. 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1.  REPORT  DATE 

26  MAR  2004  2  REPORT  TYPE 

3.  DATES  COVERED 

00-03-2004  to  00-03-2004 

4.  TITLE  AND  SUBTITLE 

Towards  a  Theory  of  Data  Entanglement 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Yale  University, Department  of  Computer  Science, PO  Box  208285, New 
Haven, CT, 06520-8285 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

29 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Towards  a  Theory  of  Data  Entanglement 

James  Aspnes  *  Joan  Feigenbaum  t  Aleksandr  Yampolskiy  *  Sheng  Zhong  § 


Abstract 

We  propose  a  formal  model  for  data  entanglement  as  used  in  storage  systems  like  Dagster  [25] 
and  Tangier  [26] .  These  systems  split  data  into  blocks  in  such  a  way  that  a  single  block  becomes 
a  part  of  several  documents;  these  documents  are  said  to  be  entangled.  Dagster  and  Tangier 
use  entanglement  in  conjunction  with  other  techniques  to  deter  a  censor  from  tampering  with 
unpopular  data.  In  this  paper,  we  assume  that  entanglement  is  a  goal  in  itself.  We  measure 
the  strength  of  a  system  by  how  thoroughly  documents  are  entangled  with  one  another  and 
how  attempting  to  remove  a  document  affects  the  other  documents  in  the  system.  We  argue 
that  while  Dagster  and  Tangier  achieve  their  stated  goals,  they  do  not  achieve  ours.  In  partic¬ 
ular,  we  prove  that  deleting  a  typical  document  in  Dagster  affects,  on  average,  only  a  constant 
number  of  other  documents;  in  Tangier,  it  affects  virtually  no  other  documents.  This  motivates 
us  to  propose  two  stronger  notions  of  entanglement,  called  dependency  and  all-or-nothing 
integrity.  All-or-nothing  integrity  binds  the  users’  data  so  that  it  is  hard  to  delete  or  modify 
the  data  of  any  one  user  without  damaging  the  data  of  all  users.  We  study  these  notions  in  six 
submodels,  differentiated  by  the  choice  of  users’  recovery  algorithms  and  restrictions  placed  on 
the  adversary.  In  each  of  these  models,  we  not  only  provide  mechanisms  for  limiting  the  damage 
done  by  the  adversary,  but  also  argue,  under  reasonable  cryptographic  assumptions,  that  no 
stronger  mechanisms  are  possible. 


1  Introduction 

Suppose  that  I  provide  you  with  remote  storage  for  your  most  valuable  information.  I  may  advertise 
various  desirable  properties  of  my  service:  underground  disk  farms  protected  from  nuclear  attack, 
daily  backups  to  chiseled  granite  monuments,  replication  to  thousands  of  sites  scattered  across  the 
globe.  But  what  assurance  do  you  have  that  I  will  not  maliciously  delete  your  data  as  soon  as  your 
subscription  check  clears? 

To  convince  you  that  you  will  not  lose  your  data  at  my  random  whim,  I  might  offer  stronger 
technical  guarantees.  Two  storage  systems  proposed  linking  the  fate  of  your  data  to  the  data  of 
other  users:  Dagster  [25]  and  Tangier  [26].  The  intuition  behind  these  systems  is  that  data  are 
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partitioned  into  blocks  in  a  way  that  every  block  can  be  used  to  reconstruct  several  documents.  New 
documents  are  represented  using  some  number  of  existing  blocks,  chosen  randomly  from  the  pool, 
combined  with  new  blocks  created  using  exclusive-or  (Dagster)  or  3-out-of-4  secret  sharing  [22] 
(Tangier).  Two  documents  that  share  a  server  block  are  said  to  be  entangled. 

Entangling  new  documents  with  old  ones  provides  an  incentive  to  retain  blocks,  as  the  loss 
of  a  particular  block  might  render  many  important  documents  inaccessible.  Dagster  and  Tangier 
use  entanglement  as  one  of  many  mechanisms  to  discourage  negligent  or  malicious  destruction  of 
data;  others  involve  disguising  the  ownership  and  contents  of  documents  and  (in  Tangier)  storing 
documents  redundantly.  The  present  work  is  motivated  in  part  by  the  question  of  whether  these  ad¬ 
ditional  mechanisms  are  necessary,  or  whether  entanglement  by  itself  can  act  as  an  insurmountable 
barrier  to  malicious  censorship. 

We  begin  by  analyzing  the  use  of  entanglement  in  Dagster  and  Tangier  in  Section  2.  We 
argue  that  the  notion  of  entanglement  provided  by  Dagster  and  Tangier  is  not  by  itself  sufficiently 
strong  to  discourage  censorship  by  punishing  data  loss,  as  not  enough  documents  get  deleted  on 
average  if  an  adversary  destroys  a  block  for  some  targeted  document.  In  particular,  we  show  in 
Section  2.3  that  destroying  a  typical  document  in  Dagster  requires  destroying  only  a  constant 
number  of  additional  documents  on  average,  even  if  the  adversary  is  restricted  to  the  very  limited 
attack  of  deleting  a  single  block  chosen  uniformly  at  random  from  the  blocks  that  make  up  the 
document.  The  situation  with  Tangier  is  worse:  because  Tangier  uses  3-out-of-4  secret  sharing  [22] 
to  store  blocks,  deleting  two  blocks  from  a  particular  document  destroys  the  document  without 
destroying  any  others  (which  lose  at  most  one  block  each)  in  the  typical  case. 

These  properties  of  Dagster  and  Tangier  arise  from  the  particular  strategies  used  to  partition 
documents  among  blocks,  which  are  determined  partly  by  other  goals  not  related  to  entanglement: 
the  desire  to  maintain  scalability  by  using  only  a  constant  number  of  existing  blocks  in  both  systems 
and  the  desire  to  protect  documents  against  non-targeted  faults  in  Tangier.  We  can  easily  imagine 
modified  systems  that  increase  the  level  of  entanglement  by  sacrificing  these  goals.  However,  it 
will  still  be  the  case  that  such  systems  are  vulnerable  to  more  sophisticated  attacks  than  simply 
deleting  individual  blocks  (we  describe  some  of  these  attacks  later).  Furthermore,  the  simple  block¬ 
sharing  notion  of  entanglement  used  in  these  systems  creates  only  a  tenuous  connection  between 
the  survival  of  individual  documents.  Although  it  is  often  the  case  that  destroying  one  document 
will  destroy  some  others,  there  are  no  specific  other  documents  that  will  necessarily  be  destroyed. 
Thus  Dagster  and  Tangier  must  rely  on  additional  mechanisms  to  discourage  deletions. 

Our  objective  in  the  present  work  is  to  examine  the  possibility  of  obtaining  stronger  notions  of 
entanglement,  in  which  the  fates  of  specific  documents  are  directly  tied  together.  These  stronger 
notions  might  be  enough  by  themselves  to  deter  censorship,  in  that  destroying  a  particular  docu¬ 
ment,  even  if  done  by  a  very  sophisticated  adversary,  could  require  destroying  most  or  all  of  the 
other  documents  in  the  system.  A  system  that  provides  such  tight  binding  between  documents 
gives  a  weak  form  of  censorship  resistance;  though  we  cannot  guarantee  that  no  documents  will  be 
lost,  we  can  guarantee  that  no  documents  will  be  lost  unless  the  adversary  burns  down  the  library. 
Under  the  assumption  that  the  adversary  can  destroy  data  at  will,  this  may  be  the  best  guarantee 
we  can  hope  to  offer. 

In  Section  3,  we  define  our  model  for  a  document-storage  system  in  which  the  adversary  is 
allowed  to  modify  the  common  data  store  after  all  documents  have  been  stored.  Such  modifications 
may  include  the  block-deletion  attacks  that  Dagster  and  Tangier  are  designed  to  resist,  but  they 
may  also  include  more  sophisticated  attacks  such  as  replacing  parts  of  the  store  with  modified  data, 
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or  superencrypting  all  or  part  of  the  store. 

In  addition  to  modifying  the  data  store,  in  some  variants  of  the  model  the  adversary  is  permitted 
to  carry  out  what  we  call  an  upgrade  attack  (see  Section  3.2),  in  which  the  adversary  offers  all 
interested  users  the  choice  of  adopting  a  new  algorithm  to  recover  their  data  from  the  common 
store  if  they  find  that  the  old  one  no  longer  works.  Allowing  such  upgrade  attacks  is  motivated 
by  the  observation  that  a  selfish  user  will  jump  at  the  chance  to  get  his  data  back  (especially  if  he 
has  the  ability  to  distinguish  genuine  from  false  data)  if  the  alternative  is  losing  the  data.  Upgrade 
attacks  also  exclude  dependency  mechanisms  that  rely  on  excessive  fastidiousness  in  the  recovery 
algorithm,  as  one  might  see  in  a  recovery  algorithm  that  politely  declines  to  return  its  user’s  data 
if  it  detects  that  some  other  user’s  data  has  been  lost. 

In  Section  4,  we  propose  two  stronger  notions  of  entanglement  in  the  context  of  our  model: 
document  dependency  and  all-or-nothing  integrity.  If  a  document  of  user  A  depends  on 
a  document  of  user  B,  then  A  can  recover  her  document  only  if  B  can.  Unlike  entanglement, 
document  dependency  is  an  externally-visible  property  of  a  system;  it  does  not  require  knowing 
how  the  system  stores  its  data,  but  only  observing  which  documents  can  still  be  recovered  after  an 
attack.  All-or-nothing  integrity  is  the  ultimate  form  of  document  dependency,  in  which  every  user’s 
document  depends  on  every  other  user’s,  so  that  either  every  user  gets  his  data  back  or  no  user 
does.  Our  stronger  notions  imply  Dagster’s  and  Tangier’s  notion  of  entanglement  but  also  ensure 
that  an  adversary  cannot  delete  a  document  without  much  more  serious  costs. 

The  main  part  of  the  present  work  examines  what  security  guarantees  can  be  provided  de¬ 
pending  on  the  assumptions  made  in  the  model.  In  Section  5,  we  consider  how  the  possibility 
or  impossibility  of  providing  document  dependency  depends  on  the  choice  of  permitted  adversary 
attacks.  Section  5.1  shows  the  property,  hinted  at  above,  that  detecting  tampering  using  a  MAC 
suffices  for  obtaining  all-or-nothing  integrity  if  all  users  use  a  standard  “polite”  recovery  algorithm. 
Section  5.2  shows  that  all-or-nothing  integrity  can  no  longer  be  achieved  for  an  unrestricted  ad¬ 
versary  if  we  allow  upgrade  attacks.  In  Section  5.3,  we  show  how  to  obtain  a  weaker  guarantee 
that  we  call  symmetric  recovery  even  with  the  most  powerful  adversary;  here,  each  document 
is  equally  likely  to  be  destroyed  by  any  attack  on  the  data  store.  This  approach  models  systems 
that  attempt  to  prevent  censorship  by  disguising  where  documents  are  stored.  In  Section  5.4,  we 
show  that  it  is  possible  to  achieve  all-or-nothing  integrity  despite  upgrade  attacks  if  the  adversary 
can  only  modify  the  common  data  store  in  a  way  that  destroys  entropy,  a  generalization  of  the 
block-deleting  attacks  considered  in  Dagster  and  Tangier. 

In  Section  6,  we  discuss  some  related  work  on  untrusted  storage  systems. 

Finally,  in  Section  7,  we  discuss  the  strengths  and  limitations  of  our  approach,  and  offer  sug¬ 
gestions  for  future  work  in  this  area. 

2  Dagster  and  Tangier 

We  review  how  Dagster  [25]  and  Tangier  [26]  work  in  Sections  2.1  and  2.2,  respectively.  We 
describe  these  systems  at  the  block  level  and  omit  details  of  how  they  break  a  document  into 
blocks  and  assemble  blocks  back  into  a  document.  In  Section  2.3,  we  analyze  the  intuitive  notion  of 
entanglement  provided  by  these  systems,  pointing  out  some  of  this  notion’s  shortcomings  if  (unlike 
in  Dagster  and  Tangier)  it  is  the  only  mechanism  provided  to  deter  censorship.  This  motivates  us 
to  search  for  stronger  notions  of  entanglement  in  Section  4. 
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2.1  Dagster 

The  Dagster  storage  system  may  run  on  a  single  server  or  on  a  P2P  overlay  network.  Each  Dagster 
server  enumerates  the  blocks  in  a  Merkle  hash  tree  [17],  which  makes  it  easy  for  the  users  to  verify 
integrity  of  any  server  block.  The  tree  stores  the  actual  data  at  its  leaves,  and  each  internal  node 
contains  a  hash  of  its  children.  Each  document  in  Dagster  consists  of  c+ 1  server  blocks:  c  blocks  of 
older  documents  and  one  new  block,  which  is  an  exclusive-or  of  previous  blocks  with  the  encrypted 
document.  The  storage  protocol  proceeds  as  follows: 

Initialization  Upon  startup,  the  server  creates  approximately  1,000  random  server  blocks  and 
adds  them  to  the  system. 

Entanglement  To  publish  document  di,  user  i  generates  a  random  key  kt.  He  then  chooses  c 
random  server  blocks  C), , . . . ,  C\c  and  computes  a  new  block 

Ci  =  eki(di)®  ©  cijt 

j=l...c 

where  £  is  a  plaintext-aware  encryption  function1  and  ©  is  bitwise  exclusive-or. 

The  user  periodically  queries  the  hash  tree  until  he  is  convinced  that  block  C{  was  selected 
by  other  users.  He  then  releases  instructions  on  recovering  rlj,  which  come  in  the  form  of  a 
Dagster  Resource  Locator  (DRL),  which  is  a  list  of  hashes  of  blocks  needed  to  reconstruct 
df. 

(h,  7T  [H(Ci),  H(Cq),  H(Ci2), . . . ,  H(Ci J])  . 

Here  H(-)  is  a  cryptographic  hash  function  and  n  is  a  random  permutation. 

Recovery  To  recover  di,  the  user  asks  the  server  for  blocks  with  hashes  listed  in  the  DRL  of  di.  If 
the  hashes  of  the  blocks  returned  by  the  server  match  the  ones  in  the  DRL,  the  user  computes: 

T^ki  |  Ci  ©  (J)  al:i 

\  3=1— c 

where  V  represents  a  decryption  function.  Otherwise,  the  user  exits. 

2.2  Tangier 

The  Tangier  storage  system  employs  a  network  of  servers.  It  derives  its  name  from  the  use  of  (3, 
4)  Shamir  secret  sharing  (see  [22]  for  details)  to  entangle  the  data.  Each  document  is  represented 
by  four  server  blocks,  any  three  of  which  are  sufficient  to  reconstruct  the  original  document.  The 
blocks  are  replicated  across  a  subset  of  Tangier  servers,  which  is  uniquely  determined  from  blocks’ 
cryptographic  hashes.  A  data  structure  similar  to  Dagster  Resource  Locator,  called  an  inode,  is 
used  to  record  the  hashes  of  the  blocks  needed  to  reconstruct  the  document.  Here  is  the  storage 
protocol: 

Initialization  As  in  Dagster,  the  server  is  jump-started  with  a  bunch  of  random  blocks. 

1A  plaintext-aware  encryption  function  is  one  for  which  it  is  computationally  infeasible  to  generate  a  valid  cipher- 
text  without  knowing  the  corresponding  plaintext.  See  [3]  for  a  formal  definition. 
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Entanglement  The  server  blocks  in  Tangier  play  the  role  of  Shamir  shares.  Each  block  is  a  pair 
( x,y ),  where  x  is  used  as  an  argument  to  a  polynomial  over  GF( 216)  and  y  is  the  value  of 
the  polynomial  at  x.  The  polynomial  is  uniquely  determined  by  any  three  blocks,  and  it  is 
constructed  in  a  way  that  evaluating  it  at  zero  yields  the  actual  data. 

To  publish  document  di,  user  i  downloads  two  random  server  blocks:  Ctl  =  (xi,yi)  and  Ci2  = 
(#2,2/2)-  He  interpolates  these  blocks  together  with  (0 ,di)  to  form  a  quadratic  polynomial 
/(•).  Evaluating  /(•)  at  two  different  nonzero  integers  produces  new  server  blocks:  and 

C[  .  The  user  uploads  the  new  blocks  to  the  server.  Finally,  he  adds  the  hashes  of  the  blocks 
needed  to  reconstruct  di  (viz.,  the  old  blocks  C',;, ,  C',;2  and  the  new  blocks  C'ix ,  C'V2)  to  di  s 
inode. 

Recovery  To  recover  his  document,  user  i  sends  a  request  for  blocks  listed  in  di  s  inode  to  a 
subset  of  Tangier  servers.  Upon  receiving  three  of  di  s  blocks,  the  user  can  reconstruct  /(•) 
and  compute  di  =  /( 0). 


Figure  1:  An  entanglement  graph  is  a  bipartite  graph  from  the  set  of  documents  to  the  set  of  server 
blocks.  An  edge  (dj,  Ck)  is  in  the  graph  if  server  block  Ck  can  be  used  to  reconstruct  document  dj. 


2.3  Analysis  of  entanglement 

Let  us  take  a  “snapshot”  of  the  contents  of  a  Dagster  or  Tangier  server.  The  server  contains  a  set 
of  blocks  {C\, . . . ,  Cm}  that  comprise  documents  {d\ , . . . ,  dn}  of  a  group  of  users.  (Here  m,  n  £  N 
and  m  >  n.) 

Data  are  partitioned  in  a  way  that  each  block  becomes  a  part  of  several  documents.  We  can 
depict  this  documents-blocks  relationship  using  an  entanglement  graph  (see  Figure  1).  The 
graph  contains  an  edge  (dj,Ck)  if  block  Ck  can  be  used  to  reconstruct  document  dj.  Note  that 
even  if  the  graph  contains  ( dj,Ck ),  it  may  still  be  possible  to  reconstruct  dj  from  other  blocks 
excluding  Ck ■  Document  nodes  in  Dagster’s  entanglement  graph  have  an  out-degree  c  +  1,  and 
those  in  Tangier’s  have  out-degree  four.  Entangled  documents  share  one  or  more  server  blocks. 
In  Figure  1,  documents  d\  and  dn  are  entangled  because  they  share  server  block  C\ ;  meanwhile, 
documents  d\  and  ^2  are  not  entangled. 

This  notion  of  entanglement,  provided  by  Dagster  and  Tangier,  has  several  drawbacks.  Even  if 
document  dj  is  entangled  with  a  specific  document,  it  may  still  be  possible  to  delete  dj  from  the 
server  without  affecting  that  particular  document.  For  example,  knowing  that  dn  is  entangled  with 
d\,  owned  by  some  Very  Important  Person,  may  give  solace  to  the  owner  of  dn  (refer  to  Figure  1), 
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who  might  assume  that  no  adversary  would  dare  incur  the  wrath  of  the  VIP  merely  to  destroy 
dn.  However,  as  depicted  in  the  figure,  the  adversary  can  still  delete  server  blocks  C2  and  Cm  and 
corrupt  dn  but  not  d\ . 

Moreover,  the  user  does  not  get  to  choose  the  documents  to  be  entangled  with  his  document; 
these  documents  are  chosen  randomly.  While  destroying  a  user’s  document  is  likely  to  destroy 
some  others,  there  are  no  specific  other  documents  that  will  necessarily  be  destroyed.  If  few 
other  documents  get  destroyed  on  average,  the  small  risk  of  accidentally  corrupting  an  important 
document  will  be  unlikely  to  deter  the  adversary  from  tampering  with  data. 

We  now  derive  an  upper  bound  on  how  many  documents  get  destroyed  if  we  delete  a  random 
document  from  a  Dagster  or  Tangier  server.  Intuitively,  the  earlier  the  document  was  uploaded 
onto  the  server,  the  more  documents  it  is  entangled  with  and  the  more  other  documents  will  get 
destroyed.  Without  loss  of  generality,  we  can  assume  that  documents  are  numbered  in  the  order 
in  which  they  were  uploaded;  namely,  for  all  1  <  j  <  n,  document  dj  was  uploaded  to  the  server 
before  dy+\ .  We  consider  a  restricted  adversary,  who  randomly  chooses  several  blocks  of  dj  (one 
block  in  Dagster;  two  in  Tangier)  and  overwrites  them  with  zeroes. 

2.3.1  Deleting  a  targeted  document 

First,  we  show  the  expected  side-effects  of  deleting  the  j-th  document;  in  the  following  section  we 
use  this  to  calculate  the  effects  of  deleting  a  document  chosen  uniformly  at  random. 

Theorem  1  In  a  Dagster  server  with  no  =  0(1)  initial  blocks  and  n  documents,  where  each  doc¬ 
ument  is  linked  with  c  pre-existing  blocks,  deleting  a  random  block  of  document  dj  (1  <  j  <  n ) 
destroys 

0  (dog  (y)) 

other  documents. 

Proof:  There  are  altogether  m  =  no  +  n  blocks  stored  on  the  server:  no  initial  blocks  and  n 
data  blocks.  We  label  the  data  blocks  Ca, . . . ,  Cn.  The  initial  blocks  exist  on  the  server  before  any 
data  blocks  have  been  added.  We  label  them  C-no+ 1, . . . ,  Cq. 

Every  document  dj  consists  of  c  pre-existing  blocks2  and  a  data  block  Cj  that  is  computed 
during  the  entanglement  stage.  Consider  an  adversary  who  destroys  a  random  block  Ct  of  dj.  This 
will  destroy  dj ,  but  it  will  also  destroy  any  documents  with  edges  outgoing  to  C;  in  the  entanglement 
graph.  We  would  like  to  compute  the  number  of  such  documents,  Nt. 

If  Ci  is  a  data  block  (i.e.,  i  >  1),  then 

2These  may  be  either  initial  blocks  or  data  blocks  of  documents  added  earlier  than  dj  (i.e.,  du  with  k  <  j). 
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Meanwhile,  if  Ct  is  an  initial  block  ( i.e .,  i  <  1),  it  can  be  linked  by  any 


constant, 
of  the  documents: 
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=  0(c  log  n). 

The  number  of  documents  deleted  on  average  when  the  adversary  destroys  a  random  block  of 
dj  is 
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We  can  use  Stirling’s  formula  to  bound  the  leading  term  in  (1): 


-  °(aog(ri=)) 

=  yyy) 

=  o(cilog(=)). 


The  theorem  follows. 
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Theorem  2  In  a  Tangier  server  with  no  =  0(1)  initial  blocks  and  n  documents,  deleting  two 
random  blocks  of  document  dj  (1  <  j  <  n)  destroys 


other  documents. 


Proof:  The  server  contains  m  =  no  +  2n  blocks,  no  of  which  are  initial  and  2 n  are  data  blocks. 
We  label  the  blocks  as  in  the  proof  of  Theorem  1.  The  initial  blocks  are  O_no+i, . . .  ,Cq  and  the 
data  blocks  are  Ci,  O2, . . . ,  C‘2n- 1 ,  C^n- 

In  Tangier,  every  document  dj  consists  of  two  old  blocks  of  pre-existing  documents  and  two  new 
blocks  C2j-i  and  Cfj ,  computed  during  the  entanglement  stage.  Suppose  an  adversary  deletes  any 
two  out  of  four  blocks  comprising  dj ;  call  these  blocks  Ci ,  C\ .  Then  any  document  (4  (k  /  j)  that 
contains  both  C\  and  Ct  (viz.,  has  edges  outgoing  to  Ci  and  Ct  in  the  entanglement  graph),  will 
also  get  destroyed.  We  would  like  to  compute  the  number  of  such  documents,  Navg. 

In  our  analysis,  we  consider  whether  deleted  blocks  Ci,Ct  are  new  or  old  to  dj  and  d^.  We 
distinguish  between  five  cases: 


Case  1:  Ci,Ct  are  old  to  both  dj  and  dk-  Then  the  number  of  deleted  documents  is 


E 


k= 1 


1 


2j-'2+n0 

2 


n 

+  E 

k=j+i 
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^2/c — 


Case  2:  Ci,Ct  are  old  to  dj.  However,  only  one  of  them  is  old  to  dk,  while  the  other  is  new  to  dk- 
Note  that,  in  this  case,  we  must  have  k  <  j. 


/—<  (2j-2+n0\  ' 

k= 1  v  2  ) 


Case  3:  Q ,  C\  are  old  to  dj,  but  new  to  dk-  Note  that,  in  this  case,  we  must  also  have  k  <  j. 


E 


k= 1 


1 


2j-2+n0 

2 


Case  4:  One  of  Ct,  Ct  is  old  to  dj,  the  other  is  new  to  dj.  Note  that  both  Ci  and  Ct  must  be  old 
to  dk  (because  otherwise  we  would  have  k  <  j,  which  implies  that  the  block  in  C%,  Ct  that  is 
new  to  dj  is  not  linked  to  dk,  which  further  implies  that  dk  will  not  get  deleted). 


n 


E 

k=j+i 


4 


^2/c— 2+no^ 


Case  5:  Ci,Ct  are  new  to  dj.  In  this  case,  both  and  Ct  must  be  old  to  dk  (for  the  same  reason 
as  in  Case  4). 


Summing  up  the  five  cases,  gives  us  the  total  number  of  documents  destroyed: 
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2  j  —  3 


2n  —  3 
for  large  n. 
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2.3.2  Deleting  a  random  document 

Using  Theorem  1,  we  can  show  that  destroying  a  typical  document  in  Dagster  affects  few  other 
documents  on  average: 

Corollary  3  In  a  Dagster  server,  deleting  a  block  of  a  random  document  destroys  on  average  0(6) 
other  documents. 

Proof:  Suppose  the  server  contains  n  documents,  each  document  linked  with  c  server  blocks. 
According  to  Theorem  1,  deleting  document  dj  (1  <  j  <  n)  affects 

°(clog(j)) 

other  documents. 

Therefore,  deleting  a  typical  dj  will  affect 


=  0(c) 


documents.  I 

Intuitively,  one  might  expect  the  corollary  to  follow  immediately  from  the  fact  that  each  doc¬ 
ument  depends  on  c  blocks,  which  means  (when  n  is  large  enough  that  the  number  of  blocks  and 
documents  are  proportional)  that  the  average  block  is  used  by  0(c)  documents.  While  such  an 
argument  does  tell  us  what  happens  if  the  adversary  deletes  a  block  chosen  uniformly  at  random, 
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choosing  a  document  uniformly  at  random  and  then  choosing  a  block  from  that  document  biases 
the  choice  of  block  toward  earlier  blocks  that  are  used  in  more  documents;  nonetheless,  the  corollary 
shows  that  this  bias  provides  at  most  a  constant  increase  in  the  destructive  effect. 

Meanwhile,  Theorem  2  tells  us  that  destroying  a  document  dj  from  the  Tangier  server  affects 
0(1/ j)  other  documents  on  average.  From  this  bound  it  is  immediate  that: 

Corollary  4  In  a  Tangier  server,  deleting  two  blocks  of  a  random  document  destroys  on  average 
O  other  documents. 

We  illustrate  the  limited  effectiveness  of  this  form  of  entanglement  by  itself  as  a  censorship 
deterrent  with  a  simple  example: 

Example:  Suppose  there  are  n  =  500  documents  stored  on  a  Dagster  server,  where  each  document 
is  linked  with  c  =  15  blocks.  Further,  suppose  five  of  the  documents  belong  to  a  VIP.  From  Corol¬ 
lary  3,  we  know  that  deleting  a  block  of  a  typical  document  destroys  roughly  15  other  documents. 
The  probability  that  one  of  the  destroyed  documents  will  belong  to  the  VIP  is 


The  adversary  has  a  six  in  seven  chance  of  deleting  a  typical  document  without  affecting  the  VIP ’s 
data.  □ 

Even  a  small  chance  of  destroying  an  important  document  will  deter  tampering  to  some  extent, 
but  some  tamperers  might  be  willing  to  run  that  risk.  Still  more  troubling  is  the  possibility  that 
the  tamperer  might  first  flood  the  system  with  junk  documents,  so  that  almost  all  real  documents 
were  entangled  only  with  junk.  Since  our  bounds  show  that  destruction  of  a  typical  document  will 
on  average  affect  only  a  handful  of  others  in  Dagster  and  almost  none  in  Tangier,  we  will  need 
stronger  entanglement  mechanisms  if  entanglement  is  to  deter  tampering  by  itself. 

3  Our  model 

In  Section  3.1,  we  start  by  giving  a  basic  framework  for  modeling  systems  such  as  Dagster  and 
Tangier  that  entangle  data.  Specializing  the  general  framework  gives  three  specific  system  models, 
differentiated  by  the  choice  of  recovery  algorithms.  We  discuss  these  models,  along  with  different 
kinds  of  adversaries  that  we  consider,  in  Section  3.2. 

As  we  have  mentioned,  Dagster  and  Tangier  use  many  worthwhile  techniques  in  conjunction 
with  entanglement  to  provide  censorship-resistance.  Our  model  abstracts  away  many  such  details 
of  storage  and  recovery  processes.  Instead,  we  concentrate  on  a  single  entanglement  operation 
performed  by  these  systems.  An  entanglement  operation  takes  documents  of  a  finite  group  of  users 
and  intertwines  these  documents  to  form  a  common  store.  The  server  contents  are  computed  as  an 
aggregation  of  common  stores  from  multiple  entanglement  operations.  We  number  the  group’s  users 
{1, ...  ,n}  and  assume  every  user  i  possesses  a  document  di  that  he  wants  to  publish.3  With  the 

3In  practice,  a  censorship-resistant  system  should  provide  some  degree  of  anonymity  to  its  users.  The  numbers  we 
assign  to  users  play  the  role  of  pseudonyms.  While  the  adversary  may  be  able  to  distinguish  documents  owned  by 
user  i  from  documents  owned  by  user  j,  he  should  not  be  able  to  tell  who  the  real  people  behind  the  pseudonyms  are. 
Further,  if  the  adversary  becomes  biased  against  user  i  and  decides  to  delete  all  his  data,  our  mechanisms  will  ensure 
that  the  adversary  will  also  destroy  the  data  of  many  other  users.  We  refer  the  reader  to  [21]  for  further  discussion 
of  anonymizing  censorship-resistant  systems. 
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help  of  the  model,  we  study  the  security  provided  by  entanglement  operation  in  and  of  itself.  Our 
model  can  potentially  be  expanded,  albeit  at  the  expense  of  significantly  increased  complexity.  One 
such  extension  could  be  a  multiround  model  which  would  look  at  multiple  entanglement  operations 
rather  than  just  one  operation.  We  discuss  some  possible  extensions  to  our  model  in  Section  7. 


3.1  Basic  framework 


Our  model  consists  of  an  initialization  phase,  in  which  keys  are  generated  and  distributed  to 
the  various  participants  in  the  system;  an  entanglement  phase,  in  which  the  individual  users’ 
data  are  combined  into  a  common  store;  a  tampering  phase,  in  which  the  adversary  corrupts  the 
store;  and  a  recovery  phase,  in  which  the  users  attempt  to  retrieve  their  data  from  the  corrupted 
store  using  one  or  more  recovery  algorithms. 


Figure  2:  Initialization,  entanglement,  and  tampering  stages. 

An  encoding  scheme  consists  of  three  probabilistic  Turing  machines,  (I,E,R),  that  run  in 
time  polynomial  in  the  size  of  their  inputs  and  a  security  parameter  s.  The  first  of  these,  the 
initialization  algorithm  I,  hands  out  the  keys  used  in  the  encoding  and  recovery  phases.  The 
second,  the  encoding  algorithm  E,  combines  the  users’  data  into  a  common  store  using  the 
encoding  key.  The  third,  the  recovery  algorithm  R,  attempts  to  recover  each  user’s  data  using 
the  appropriate  recovery  key. 

Acting  against  the  encoding  scheme  is  an  adversary  (/,  T,  R),  which  also  consists  of  three  prob¬ 
abilistic  polynomial-time  Turing  machines.  The  first  is  an  adversary-initialization  algorithm 
/;  like  the  good  initializer  I,  the  evil  I  is  responsible  for  generating  keys  used  by  other  parts  of 
the  adversary  during  the  protocol.  The  second  is  a  tampering  algorithm  T,  which  modifies  the 
common  store.  The  third  is  a  non-standard  recovery  algorithm  R,  which  may  be  used  by  some 
or  all  of  the  users  to  recover  their  data  from  the  modified  store. 
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We  assume  that  I,  T  and  R  are  chosen  after  /,  E ,  and  R  are  known  but  that  a  single  fixed  / , 
T,  and  R  are  used  for  arbitrarily  large  values  of  s  and  n.  This  is  necessary  for  polynomial-time 
bounds  on  T  and  R  to  have  any  effect.  We  also  assume  the  security  parameter  s  is  chosen  large 
enough  that  the  number  of  users  is  polynomial  in  s. 

Given  an  encoding  scheme  (/,  E,  R)  and  an  adversary  (/,  T,  R),  the  storage  protocol  proceeds 
as  follows  (see  also  Figure  2): 

1.  Initialization.  The  initializer  /  generates  a  combining  key  kE  used  by  the  encoding  algorithm 
and  recovery  keys  k±,  k2, . . .  kn,  where  each  key  ki  is  used  by  the  recovery  algorithm  to  recover 
the  data  for  user  i.  At  the  same  time,  the  adversary  initializer  I  generates  the  shared  key  k 
for  T  and  R. 

kE,ki,k2,  ■  ■  -  kn  <-  /(Is,  n), 
k  <-  /(Is,  n). 

2.  Entanglement.  The  encoding  algorithm  E  computes  the  combined  store  C  from  the  com¬ 
bining  key  kE  and  the  data  dt: 

C  <-  E(kE,dl,d2,  ■  ■■ dn ). 

3.  Tampering.  The  tamperer  T  alters  the  combined  store  C  into  C: 

C  <—  T(k,  C). 

4.  Recovery.  The  users  attempt  to  recover  their  data.  User  i  applies  his  recovery  algorithm 
Ri  to  ki  and  the  changed  store  C.  Each  Rt  could  be  either  the  standard  recovery  algorithm 
R,  supplied  with  the  encoding  scheme,  or  the  non-standard  algorithm  R,  supplied  by  the 
adversary,  depending  on  the  choice  of  the  model. 

d'i  •*—  R%(ki,  C) 

We  say  that  user  i  recovers  his  data  if  the  output  of  R,,  equals  dt . 

3.2  Adversary  model 

We  divide  our  model  on  two  axes:  one  bounding  the  ability  of  the  reconstruction  algorithms  and 
the  other  bounding  the  ability  of  the  adversary  to  modify  the  data  store.  With  respect  to  recovery 
algorithms,  we  consider  three  variants  on  the  basic  framework  (listed  in  order  of  increasing  power 
given  to  the  adversary): 

•  In  the  standard-recovery-algorithm  model,  the  users  are  restricted  to  a  single  standard 
recovery  algorithm  R,  supplied  by  the  system  designer.  Formally,  this  means  Ri  =  R  for 
all  users  i;  The  adversary’s  recovery  algorithm  R  is  not  used.  This  is  the  model  adopted  by 
Dagster  and  Tangier. 
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•  In  the  public-recovery-algorithm  model,  the  adversary  not  only  modifies  the  combined 
store,  but  also  supplies  a  single  non-standard  recovery  algorithm  R  to  all  of  the  users.  For¬ 
mally,  we  have  Ri  =  R  for  each  i.  The  original  recovery  algorithm  R  is  not  used.4 

We  call  this  an  upgrade  attack  by  analogy  to  the  real  life  situation  of  a  company  changing 
the  data  format  of  documents  processed  by  its  software  and  distributing  a  new  version  of 
the  software  to  read  them.  We  believe  such  an  attack  is  a  realistic  possibility,  because  most 
self-interested  users  will  be  happy  to  adopt  a  new  recovery  algorithm  if  it  offers  new  features 
or  performance,  or  if  the  alternative  is  losing  their  data. 

•  In  the  private-recovery-algorithm  model,  the  adversary  may  choose  to  supply  the  non¬ 
standard  recovery  algorithm  R  to  only  a  subset  of  the  users.  The  rest  continue  to  use  the 
standard  algorithm  R.  Formally,  this  model  is  a  mix  of  the  previous  two  models:  Rj  =  R  for 
some  i  and  Ri  =  R  for  others. 

We  also  differentiate  between  two  types  of  tamperers: 

•  An  arbitrary  tamperer  can  freely  corrupt  the  data  store  and  is  not  restricted  in  any  way. 
Most  real-life  systems  fit  into  this  category  as  they  place  no  restrictions  on  the  tamperer. 

•  A  destructive  tamperer  can  only  apply  a  transformation  to  the  store  whose  range  of 
possible  outputs  is  substantially  smaller  than  the  set  of  inputs.  The  destructive  tamperer  can 
superimpose  its  own  encryption  on  the  common  store,  transform  the  store  in  arbitrary  ways, 
and  even  add  additional  data,  provided  that  the  cumulative  effect  of  all  these  operations  is 
to  decrease  the  entropy  of  the  data  store. 

For  example,  such  an  adversary  might  erase  parts  of  the  data  store  without  being  able  to  se¬ 
lectively  modify  the  store’s  contents,  a  restriction  that  could  be  implemented  using  write-once 
media.  Though  requiring  only  destructive  tampering  may  look  like  an  artificial  restriction, 
we  will  need  it  to  achieve  all-or-nothing  integrity  in  the  private-recovery-algorithm  model. 

Recall  that  an  adversary  is  a  tuple  ( I,T,R ),  which  consists  of  an  initializer  /,  a  tamperer  T, 
and  a  non-standard  recovery  algorithm  R.  An  adversary  class  specifies  what  kind  of  tamperer 
T  is  and  which  users,  if  any,  receive  R  as  their  recovery  algorithm.  Altogether,  we  consider  6  (= 
3x2)  adversary  classes,  each  corresponding  to  a  combination  of  aforementioned  constraints  on  the 
tamperer  and  the  recovery  algorithms.  We  shall  see  that  the  possibility  of  achieving  all-or-nothing 
integrity  ultimately  depends  on  the  adversary  class  we  consider. 

4  Dependency  and  all-or-nothing  integrity 

We  now  give  our  definition  of  data  dependency  for  a  particular  encoding  scheme  and  adversary  class. 
We  first  discuss  some  preliminary  notions  in  Section  4.1.  Our  stronger  notions  of  entanglement, 
called  dependency  and  all-or-nothing  integrity,  are  defined  formally  in  Section  4.2. 

4Though  it  may  seem  unreasonable  to  prevent  users  from  choosing  the  original  recovery  algorithm  R,  any  R  can 
be  rendered  useless  in  practice  by  superencrypting  the  data  store  and  distributing  the  decryption  key  only  with  the 
adversary’s  R.  We  further  discuss  this  issue  in  Section  5.2. 
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4.1  Preliminaries 

Because  we  consider  protocols  involving  probabilistic  Turing  machines,  we  must  be  careful  in  talking 
about  probabilities.  Fix  an  encoding  ( I,E,R ),  an  adversary  A  =  ( I,T,R ),  and  the  recovery 
algorithm  Rj  for  each  user  i.  An  execution  of  the  resulting  system  specifies  the  inputs  ki  and 
di  to  E,  the  output  of  E,  the  tanrperer’s  input  k  and  output  C,  and  the  output  of  the  recovery 
algorithm  Ri  ( R(ki ,  C)  or  R(k,  ki,  C)  as  appropriate)  for  each  user.  The  set  of  possible  executions 
of  the  storage  system  is  assigned  probabilities  in  the  obvious  way:  the  probability  of  an  execution 
is  taken  over  the  inputs  to  the  storage  system  and  the  coin  tosses  of  the  encoding  scheme  and  the 
adversary. 

It  will  be  convenient  to  consider  multiple  adversaries  with  a  fixed  encoding  scheme.  In  this  case, 
we  use  Prj4(<5)  to  denote  the  probability  that  an  event  Q  occurs  when  A  is  the  adversary. 

During  an  execution  of  the  storage  system,  the  tamperer  alters  the  combined  store  from  C  into 
C.  As  a  result,  some  users  end  up  recovering  their  documents  while  others  do  not.  The  recovery 
vector  of  an  execution  specifies  which  documents  were  successfully  recovered  in  that  execution. 
Formally: 

Definition  5  The  recovery  vector  of  execution  a,  denoted  p(a),  is  a  bit  vector 

p(a)  (/?! ,  P2  ?  *  *  •  5  Pn)  i 


where 


Pi  = 


1  if  Ri(ki,C)  =  di, 
0  otherwise 


To  illustrate,  suppose  the  server  contains  three  documents:  d\,  cfe,  and  d 3.  If  in  execution  a  we 
recover  only  documents  d\  and  c?2>  we  then  have:  p(a)  =  110. 

When  we  think  of  cc  as  a  random  variable  (having  fixed  a  particular  encoding  algorithm  and 
adversary),  we  will  use  i^as  a  shorthand  for  the  random  variable  p(a). 


4.2  Our  notions  of  entanglement 

In  Section  2,  we  observed  that  the  block-sharing  notion  of  entanglement  provided  by  Dagster  and 
Tangier  does  not  by  itself  provide  strong  security  guarantees.  Two  documents  may  be  entangled 
together  in  this  sense  even  though  it  is  still  possible  to  delete  one  of  them  without  affecting  the  other. 
Block-sharing  entanglement  also  makes  sense  only  in  the  context  of  a  system  that  uses  block-sharing 
and  requires  observing  the  internal  workings  of  the  system.  We  would  like  a  definition  that  applies 
to  a  much  wider  class  of  potential  systems,  which  requires  considering  only  externally-observable 
properties. 

This  motivates  us  to  propose  the  notion  of  document  dependency.  Document  dependency 
formalizes  the  idea  that  “if  my  data  depends  on  yours,  I  can’t  get  my  data  back  if  you  can’t.” 
In  this  way,  the  fates  of  specific  documents  become  linked  together:  specifically,  if  document  di 
depends  on  document  dj,  then  whenever  dj  cannot  be  recovered  neither  can  di. 

Given  just  one  execution,  either  users  i  and  j  each  get  their  data  back  or  they  don’t.  So  how 
can  we  say  that  the  particular  outcome  for  i  depends  on  the  outcome  for  jl  Essentially,  we  are 
saying  that  we  are  happy  with  executions  in  which  either  j  recovers  its  data  (whether  or  not  i  does) 
or  in  which  j  does  not  recover  its  data  and  i  does  not  ether.  Executions  in  which  j  does  not  recover 
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its  data  but  i  does  are  bad  executions  in  this  sense.  We  will  try  to  exclude  these  bad  executions 
by  saying  that  they  either  never  occur  or  occur  only  with  very  low  probability. 

Formally,  in  terms  of  a  recovery  vector,  this  becomes: 

Definition  6  We  say  that  di  depends  on  dj,  with  respect  to  a  class  of  adversaries  A ,  denoted 
di  ^  dj,  if,  for  all  adversaries  A  E  A, 

Pr[r,;  =  0  V  rj  =  1]  >  1  —  e. 

Remark:  Hereafter,  e  refers  to  a  negligible  function  in  the  security  parameter  s.5 

The  ultimate  form  of  dependency  is  all-or-nothing  integrity.  Intuitively,  a  storage  system  is 
all-or-nothing  if  either  every  user  i  recovers  his  data  or  no  user  does: 

Definition  7  A  storage  system  is  all-or-nothing  with  respect  to  a  class  of  adversaries  A  if,  for 
all  A  E  A, 

Pr[r  =  0n  V  r  =  ln]  >  1  -  e. 

It  is  perhaps  not  surprising  that  we  have  the  all-or-nothing  property  if  and  only  if  all  documents 
depend  on  each  other: 

Lemma  8  A  storage  system  is  all-or-nothing  with  respect  to  a  class  of  adversaries  A  if  and  only 
if,  for  all  users  i,j,  di  dj. 

Proof:  Fix  an  adversary  in  A.  Let  E  be  the  event  that  an  execution  of  the  storage  system  is 
not  all-or-nothing,  and  Fij  the  event  that  document  di  was  recovered  in  an  execution  and  dj  was 
not.  Then  E  =  {r  A  0n  A  r  A  ln}  and  E \j  =  {r*  =  1  A  rj  =  0}. 

(=^)  :  If  the  system  is  all-or-nothing,  then  Pr[E']  <  e.  Clearly,  for  all  i,j,  we  have  F. \j  C  E,  which 

_4 

means  PrflA,-]  <  Pr[Fi]  <  e.  This  in  turn  implies  di  dj. 

(<f=)  :  If  for  all  i,j ,  di  dj ,  then  Pr[F,j]  <  e.  We  can  choose  e  <  e'/n2  for  a  negligible  e' . 

Notice  that  E  C  |J? Fl0 .  Therefore,  Pr[£]  <  fA  .  .  Pr[Fjj]  <  n2e  <  e'.  Hence,  Pr[_ZTc]  >  1  —  e' 
and  so  the  storage  system  is  all-or-nothing. 

I 

All-or-nothing  integrity  is  a  very  strong  property.  In  some  models,  we  may  not  be  able  to 
achieve  it,  and  we  will  accept  a  weaker  property  called  symmetric  recovery.  Symmetric  recovery 
requires  that  all  users  recover  their  documents  with  equal  probability: 

Definition  9  A  storage  system  has  symmetric  recovery  with  respect  to  a  class  of  adversaries  A 
if,  for  all  A  £  A  and  all  users  i  and  j, 

Pr[r,;  =  1]  =  Pr  [rj  =  1]. 

Symmetric  recovery  says  nothing  about  what  happens  in  particular  executions.  For  example,  it 
is  consistent  with  the  definition  for  exactly  one  of  the  data  items  to  be  recovered  in  every  execution, 
as  long  as  the  adversary  cannot  affect  which  data  item  is  recovered.  This  is  not  as  strong  a  property 
as  all-or-nothing  integrity,  but  it  is  the  best  that  can  be  done  in  some  cases. 

5A  function  e  :  N  n  (0, 1)  is  negligible  if  for  every  c  >  0,  for  all  sufficiently  large  s,  e(s)  <  l/sc.  See  any  standard 
reference,  such  as  [11],  for  details. 
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5  Possibility  and  impossibility  results 


The  possibility  of  achieving  all-or-nothing  integrity  (abbreviated  AONI)  depends  on  the  class 
of  adversaries  we  consider. 

In  Sections  5.1  through  5.3,  we  consider  adversaries  with  an  arbitrary  tamperer.  We  show 
that 

•  In  the  standard-recovery-algorithm  model,  a  simple  application  of  Message  Authentica¬ 
tion  Codes  (MACs)  achieves  all-or-nothing  integrity. 

•  In  the  public-recovery-algorithm  model,  all-or-nothing  integrity  is  no  longer  possible. 
The  best  that  can  be  done  is  to  prevent  the  adversary  from  targeting  specific  users  by  hiding 
the  location  of  each  user’s  data  within  the  store. 

•  In  the  private-recovery-algorithm  model,  even  the  weak  guarantees  of  the  public-recovery- 
algorithm  model  are  no  longer  possible,  because  the  adversary  can  superencrypt  the  data 
store  and  refuse  to  distribute  the  decryption  key  to  users  he  doesn’t  like. 

In  Section  5.4,  we  look  at  adversaries  with  a  destructive  tamperer.  We  give  a  simple  interpo¬ 
lation  scheme  that  achieves  all-or-nothing  integrity  for  a  destructive  tamperer  in  all  three  recovery 
models. 

5.1  Possibility  of  AONI  for  standard-recovery-algorithm  model 

In  the  standard-recovery-algorithm  model,  all  users  use  the  standard  recovery  algorithm  R;  that  is 
Ri  =  R  for  all  users  i.  Both  Dagster  and  Tangier  assume  this  model. 

This  model  allows  a  very  simple  mechanism  for  all-or-nothing  integrity  based  on  Message  Au¬ 
thentication  Codes  (MACs).  The  intuition  behind  this  mechanism  is  that  the  encoding  algorithm 
E  simply  tags  the  data  store  with  a  MAC  using  a  key  known  to  all  the  users,  and  the  recovery 
algorithm  R  returns  an  individual  user’s  data  only  if  the  MAC  on  the  entire  database  is  valid. 

We  begin  by  recalling  some  standard  definitions  [12]: 

Definition  10  A  MAC  consists  of  three  algorithms  (GEN,T AG,  VER)  such  that: 

1.  A  key  generator  GEN  :  Is  i— >  kMAC ,  generates  an  s-bit  key  on  input  a  security  parameter 
s  (encoded  in  unary); 

2.  A  tagging  algorithm  TAG  :  kMAC l—>  a  computes  a  MAC  a  for  any  message  m  with 
|m|  <  sc  for  some  fixed  c;  and 

3.  A  verification  algorithm  VER  :  kMAC,™,  °  ►  { accept ,  reject},  with  the  property  that 

VER(kMAC,m,TAG(kMAC,m))  =  accept  for  all  m. 

Definition  11  A  MAC  scheme  is  said  to  be  existentially  unforgeable  against  chosen  message 
attacks  if  there  is  no  polynomial-time  forger  F  that  generates  a  new  message-MAC  pair 
that  is  accepted  by  VER  with  probability  exceeding  0(s~c )  for  any  c  >  0,  even  if  F  is  given  a 
sample  of  valid  message- signature  pairs  ( mi,  at ),  where  mi  is  chosen  by  the  adversary.  Here  the 
requirement  that  ( m',a ')  is  unew”  means  simply  that  m!  is  not  equal  to  any  of  the  supplied  m *. 
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We  now  show  that  MACs  that  are  existentially  unforgeable  against  chosen  message  attacks  give 
all-or-nothing  integrity  with  standard  recovery  algorithms.  The  encoding  scheme  is  as  follows: 

Initialization  The  initialization  algorithm  I  computes  k\,jAC  =  GEN [Is).  It  then  returns  an 
encoding  key  kE  =  kMAC  and  recovery  keys  k{  =  (i,  kMAc)  ■ 

Entanglement  The  encoding  algorithm  E  generates  an  n-tuple 

m  =  (d\,  g?2j  •  ■  •  5  dn)  and  returns  C  =  (m,  a)  where  a  =  T AG {k mac,  to). 

Recovery  The  standard  recovery  algorithm  R  takes  as  input  a  key  fc*  =  (i,  kMAC )  and  the  (possibly 
modified)  store  C  =  (rh,  cr).  It  returns  fhi  if  VER(kMAC ,  dr,  a)  =  accept  and  returns  a  default 
value  _L  otherwise. 

The  following  theorem  states  that  this  encoding  scheme  is  all-or-nothing: 

Theorem  12  Let  (GEN,T  AG,VER)  be  a  MAC  scheme  that  is  existentially  unforgeable  against 
chosen  message  attacks,  and  let  ( I,E,R )  be  an  encoding  scheme  based  on  this  MAC  scheme  as 
above.  Let  A  be  the  class  of  adversaries  that  does  not  provide  non-standard  recovery  algorithms  R. 
Then  there  exists  some  minimum  sq  such  that  for  any  security  parameter  s  >  s$  and  any  inputs 
d\, . . .  ,dn  with  fC  \d%\  <  s,  (I,  E,  R)  is  all-or-nothing  with  respect  to  A. 

Proof:  Fix  some  c  >  0.  Recall  that  the  adversary  changes  the  combined  store  from  C  =  (m,  a) 
to  C  =  ( fh,a ).  We  consider  two  cases,  depending  on  whether  or  not  fh  =  m. 

In  the  first  case,  rh  =  m.  Suppose  R(ki,C )  =  dt  but  R(kj,C)  A  dj.  Then  R(kj,C)  =  _L, 
which  implies  that  V(kMAC ,  to,  cr)  A  accept  when  computed  by  R(kj,  C )  and  thus  that  a  A  But 
R(ki,C )  =  di  only  if  V(kMAC ,  to,  o')  =  accept  when  computed  by  R(ki,C).  It  follows  that  ( m,d ) 
is  a  message-MAC  pair  not  equal  to  (m,  cr)  that  V  accepts  in  the  execution  of  R{ki,C)\  by  the 
security  assumption  this  occurs  for  a  particular  execution  of  V  only  with  probability  0(s~c  )  for 
any  fixed  d .  If  we  choose  d  and  so  so  that  the  0(s~c  )  term  is  smaller  than  e^s~c  for  s  >  so,  then 
the  probability  that  any  of  the  n  executions  of  V  in  the  recovery  stage  accepts  (m,  a)  in  some  case 
where  m  =  rh ,  is  bounded  by  \ s~c . 

In  the  second  case,  rn  A  to.  Now  (m,  a)  is  a  message-MAC  pair  not  equal  to  ( m,a ).  If  every 
execution  of  V  rejects  (rh,  a),  then  all  R(di,  G )  return  _L  and  the  execution  has  a  recovery  vector  0n. 
The  only  bad  case  is  when  at  least  one  execution  of  V  erroneously  accepts  (m,a).  But  using  the 
security  assumption  and  choosing  d ,s$  as  in  the  previous  case,  we  again  have  that  the  probability 
that  V  accepts  (rh,  a)  in  any  of  the  n  executions  of  R  is  at  most  \ s~c . 

Summing  the  probabilities  of  the  two  bad  cases  gives  us  the  desired  bound:  Pr^[r  =  0"  V  r  = 
ln]  >  1  -  s~c.  I 

5.2  Impossibility  of  AONI  for  public  and  private-recovery-algorithm  models 

In  both  these  models,  the  adversary  modifies  the  common  store  and  distributes  a  non-standard 
recovery  algorithm  R  to  the  users.  In  the  public-recovery-algorithm  model,  all  users  get  the  new 
R,  whereas  in  the  private-recovery-algorithm  model,  R  is  given  only  to  a  few  select  adversary’s 
buddies.  Let  us  begin  by  showing  that  all-or-nothing  integrity  cannot  be  achieved  consistently  in 
either  case: 
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Theorem  13  For  any  encoding  scheme  ( I,E,R ),  if  A  is  the  class  of  adversaries  providing  non¬ 
standard  recovery  algorithms  R,  then  (. I,E,R )  is  not  all-or-nothing  with  respect  to  A. 

Proof:  Let  the  adversary  initializer  I  be  a  no-op  and  let  the  tamperer  T  be  the  identity 

transformation.  We  will  rely  entirely  on  the  non-standard  recovery  algorithm  to  destroy  all-or- 
nothing  integrity. 

Let  R  flip  a  biased  coin  that  comes  up  tails  with  probability  1/n,  and  return  the  result  of 
running  R  on  its  input  if  the  coin  comes  up  heads  and  _L  if  the  coin  comes  up  tails.  Then  exactly 
one  document  is  not  returned  with  probability  n  ■  (1/n)  •  (1  —  1/n)”-1,  which  converges  to  1/e  in 
the  limit.  Because  this  document  is  equally  likely  to  be  any  of  the  n  documents  by  symmetry,  we 
get  each  of  the  recovery  vectors  described  in  the  theorem  with  (non-negligible)  probability  1  /en. 

The  outcome  is  all-or-nothing  only  if  all  instances  of  R  flip  the  same  way,  which  occurs  with 
probability  Pr^ [r  =  0"  Vf  =  ln]  <  1  —  1/en.  I 

The  proof  of  Theorem  13  is  rather  trivial,  which  suggests  that  letting  the  adversary  substitute 
an  error-prone  recovery  algorithm  in  place  of  the  standard  one  gives  the  adversary  far  too  much 
power.  But  it  is  not  at  all  clear  how  to  restrict  the  model  to  allow  the  adversary  to  provide  an 
improved  recovery  algorithm  without  allowing  for  this  particular  attack. 

One  possibility  would  be  to  allow  users  to  choose  between  applying  the  original  recovery  al¬ 
gorithm  and  the  adversary’s  new  and  improved  version;  but  in  practice  this  approach  is  easily 
defeated  by  a  tamperer  T  that  encrypts  C  (which  renders  C  unusable  as  input  to  R)  coupled  with 
an  error-prone  R  that  reverses  the  encryption  (when  its  coin  comes  up  heads)  before  applying  R. 

A  more  sophisticated  approach  would  be  to  allow  R  to  analyze  R  to  attempt  to  undo  whatever 
superencryption  may  have  been  performed  and  extract  a  recovery  algorithm  that  works  all  the 
time.  Unfortunately,  this  approach  depends  on  being  able  to  extract  useful  information  about  the 
workings  of  an  arbitrary  Turing  machine.  While  it  has  been  shown  that  program  obfuscation  is 
impossible  in  general  [2],  even  in  a  specialized  form  this  operation  is  likely  to  be  very  difficult, 
especially  if  the  random  choice  to  decrypt  incorrectly  is  not  a  single  if-then  test  but  is  the  result  of 
accumulating  error  distributed  throughout  the  computation  of  R. 

On  the  other  hand,  we  do  not  know  of  any  general  mechanism  to  ensure  that  no  useful  in¬ 
formation  can  be  gleaned  from  R,  and  it  is  not  out  of  the  question  that  there  is  an  encoding  so 
transparent  that  no  superencryption  can  disguise  it  for  sufficiently  large  inputs,  given  that  both  R 
and  the  adversary’s  key  k  are  public. 

5.3  Possibility  of  symmetric  recovery  for  public-recovery-algorithm  model 

As  we  have  seen,  if  we  place  no  restrictions  on  the  tamperer,  it  becomes  impossible  to  achieve 
all-or-nothing  integrity  in  the  public-recovery-algorithm  model.  We  now  show  that  we  can  still 
achieve  symmetric  recovery. 

Because  we  cannot  prevent  mass  destruction  of  data,  we  will  settle  for  preventing  targeted 
destruction.  The  basic  intuition  is  that  if  the  encoding  process  is  symmetric  with  respect  to  permu¬ 
tations  of  the  data,  then  neither  the  adversary  nor  its  partner,  the  non-standard  recovery  algorithm, 
can  distinguish  between  different  inputs.  Symmetry  in  the  encoding  algorithm  is  not  difficult  to 
achieve  and  basically  requires  not  including  any  positional  information  in  the  keys  or  the  represen¬ 
tation  of  data  in  the  common  store.  One  example  of  symmetric  encodings  is  a  trivial  mechanism 
that  tags  each  input  di  with  a  random  ki  and  then  stores  a  sequence  of  ( di,ki )  pairs  in  random 
order. 
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Symmetry  in  the  data  is  a  stronger  requirement.  For  the  moment,  we  assume  that  users’ 
documents  di  are  independent  and  identically  distributed  (i.i.d.)  random  variables.  If  documents 
are  not  i.i.d  (in  particular,  if  they  are  fixed),  we  can  use  a  simple  trick  to  make  them  i.i.d.:  Each 
user  i  picks  a  small  number  n  independently  and  uniformly  at  random,  remembers  the  number, 
and  computes  d[  =  di  0  G{ri),  where  G  is  a  pseudorandom  generator.  The  new  d[  are  also  uniform 
and  independent  (and  thus  computationally  indistinguishable  from  i.i.d.).  The  users  can  then  store 
documents  (1  <  i  <  n)  instead  of  the  original  documents  dz .  To  recover  di,  user  i  would  retrieve 
d[  from  the  server  and  compute  di  =  d't  ©  G{ri). 

We  shall  need  a  formal  definition  of  symmetric  encodings: 

Definition  14  An  encoding  scheme  (/,  E,  R)  is  symmetric  if,  for  any  s  and  n,  any  inputs 
di,  g?2,  ■  •  •  dn,  and  any  permutation  7 r  of  the  indices  1  through  n,  if  the  joint  distribution  ofk\,  &2, . . . ,  kn 
and  C  in  executions  with  user  inputs  d\,  d-2,  ■  ■ ■  dn  is  equal  to  the  joint  distribution  ofkni,  kn2, . . . ,  k7tn 
and  C  in  executions  with  user  inputs  dni ,  dn2, . . .  dnn . 

Using  this  definition,  it  is  easy  to  show  that  any  symmetric  encoding  gives  symmetric  recovery: 

Theorem  15  Let  ( I,E,R )  be  a  symmetric  encoding  scheme.  Let  A  be  a  class  of  adversaries  as 
in  Theorem  13.  Fix  s  and  n,  and  let  d\,...,dn  be  random  variables  that  are  independent  and 
identically  distributed.  Then  (/,  E,  R)  has  symmetric  recovery  with  respect  to  A. 

Proof:  Fix  i  and  j.  From  Definition  14  we  have  that  the  joint  distribution  of  the  ki  and  C  is 
symmetric  with  respect  to  permutation  of  the  user  indices;  in  particular,  for  any  fixed  d,  S  and  x, 

Pr[C  =  S,ki  =  x  |  di  =  d]  =  Pr[C  =  S,  kj  =  x\dj  =  d\.  (1) 

We  also  have,  from  the  assumption  that  the  di  are  i.i.d., 

Pr[dj  =  d]  =  Pr  [dj  =  d].  (2) 


Using  (1)  and  (2),  we  get 

Pi \R{k,ki,f{C))  =  di] 

=  Pr [R(k,  x,  T(S))  =  d]  Pr [C  =  S,  h  =  x,  d{  =  d] 

x,S,d 

=  P r[R(k,  x,  T(S))  =  d]  Pr[C  =  S,  ki  =  x  \  di  =  d]  Pr[di  =  d] 

x,S,d 

=  Y^  Pr[i2(/e,  x,  T(S))  =  d]  Pr[C  =  S,  kj  =  x  \  dj  =  d\  Pr  [dj  =  d] 

x,S,d 

=  Pr  [R(k,kj,C)  =  dj]. 

This  is  simply  another  way  of  writing  Pr^  \ri  =  1]  =  Pr^  [rj  =  1] .  I 
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5.4  Possibility  of  AONI  for  destructive  adversaries 

Unfortunately,  neither  all-or-nothing  integrity  nor  symmetric  recovery  can  be  achieved  in  the 
private-recovery-algorithm  model  for  an  arbitrary  tamperer.  The  adversary  can  always  superen- 
crypt  the  data  store  and  distribute  a  useless  recovery  algorithm  that  refuses  to  return  the  data. 
We  need  to  place  some  additional  restrictions  on  the  adversary. 

Definition  16  A  tampering  algorithm  T  is  destructive  if  the  range  ofT  when  applied  to  an  input 
domain  of  m  distinct  possible  data  stores  has  size  less  than  m. 

Remark:  The  amount  of  destructiveness  is  measured  in  bits:  if  the  range  of  T  when  applied  to 
a  domain  of  size  m  has  size  r,  then  T  destroys  lg  m  —  lg  r  bits  of  entropy.  Note  that  it  is  not 
necessarily  the  case  that  the  outputs  of  T  are  smaller  than  its  inputs;  it  is  enough  that  there  be 
fewer  of  them. 

Below,  we  describe  a  particular  encoding,  based  on  polynomial  interpolation,  with  the  property 
that  after  a  sufficiently  destructive  tampering,  the  probability  that  any  recovery  algorithm  can 
reconstruct  a  particular  di  is  small.  While  this  is  trivially  true  for  an  unrestrained  tamperer  that 
destroys  all  lgm  bits  of  the  common  store,  our  scheme  requires  only  that  with  n  documents  the 
tamperer  destroy  slightly  more  than  nlg(n/e)  bits  before  the  probability  that  any  of  the  data  can 
be  recovered  drops  below  e  (a  detailed  statement  of  this  result  is  found  in  Corollary  18).  Since 
n  counts  only  the  number  of  users  and  not  the  size  of  the  data,  for  a  fixed  population  of  users 
the  number  of  bits  that  can  be  destroyed  before  all  users  lose  their  data  is  effectively  a  constant 
independent  of  the  size  of  the  store  being  tampered  with. 

The  encoding  scheme  is  as  follows.  It  assumes  that  each  data  item  can  be  encoded  as  an  element 
of  Zp,  where  p  is  a  prime  of  roughly  s  bits. 

Initialization  The  initialization  algorithm  I  chooses  k± ,fe, . . .  kn  independently  and  uniformly 
at  random  without  replacement  from  Zp.  It  sets  kE  =  (k\,k2,  ■  ■  ■  ,kn)  and  then  returns 
kE,  k\, . . .  kn. 

Entanglement  The  encoding  algorithm  E  computes,  using  Lagrange  interpolation,  the  coefficients 
cn_  i,  cn—2 , ...  Co  of  the  unique  degree  (n  —  1)  polynomial  /  over  Zp  with  the  property  that 
f{ki)  =  di  for  each  i.  It  returns  C  =  (cn_i,  cn_2,  •  •  •  Co). 

Recovery  The  standard  recovery  algorithm  R  returns  f(ki ),  where  /  is  the  polynomial  whose 
coefficients  are  given  by  C. 

Intuitively,  the  reason  the  tamperer  cannot  remove  too  much  entropy  without  destroying  all 
data  is  that  it  cannot  identify  which  points  d  =  f(k)  correspond  to  actual  user  keys.  When  it 
maps  two  polynomials  /i  and  /2  to  the  same  corrupted  store  C ,  the  best  that  the  non-standard 
recovery  algorithm  can  do  is  return  one  of  fi{ki)  or  f2{kf)  given  a  particular  key  k{.  But  if  too 
many  polynomials  are  mapped  to  the  same  C ,  the  odds  that  R  returns  the  value  of  the  correct 
polynomial  will  be  small. 

A  complication  is  that  a  particularly  clever  adversary  could  look  for  polynomials  whose  values 
overlap;  if  f\(k)  =  f2{k),  it  doesn’t  matter  which  /  the  recovery  algorithm  picks.  But  here  we  can 
use  that  fact  that  two  degree  (n  —  1)  polynomials  can’t  overlap  in  more  than  (n  —  1)  places  without 
being  equal.  This  limits  how  much  packing  the  adversary  can  do. 


20 


As  in  Theorem  15,  we  assume  that  the  user  inputs  d±, ... ,  dn  are  chosen  independently  and  have 
identical  distributions.  We  make  a  further  assumption  that  each  di  is  chosen  uniformly  from  Zp. 
This  is  necessary  to  ensure  that  the  resulting  polynomials  span  the  full  pn  possibilities.  Recall  that, 
in  Section  5.2,  we  argued  how  to  get  rid  of  the  assumption  that  the  diS  are  i.i.d.  That  argument 
can  also  be  used  here  to  get  rid  of  the  assumption  that  each  di  is  independent  and  uniform. 

Under  these  conditions,  sufficiently  destructive  tampering  prevents  recovery  of  any  information 
with  high  probability.  We  will  show  an  accurate  but  inconvenient  bound  on  this  probability  in 
Theorem  17  and  give  a  cruder  but  more  useful  statement  of  the  bound  in  Corollary  18. 


Theorem  17  Let  ( I,E,R )  be  defined  as  above.  Let  A  =  ( I,T,R )  be  an  adversary  where  T  is 
destructive:  for  a  fixed  input  size  and  security  parameter,  there  is  a  constant  M  such  that  for  each 
key  k, 


\{f(kJ)}\<M , 

where  f  ranges  over  the  possible  store  values,  i.e.  over  all  degree-(n  —  1)  polynomials  over  Zp.  If 
the  di  are  drawn  independently  and  uniformly  from  Zp,  then  the  probability  that  at  least  one  user  i 
recovers  di  using  R  is 


Pr  [rfi  0n]  < 


2n2  +  nMxln 
V 


(3) 


even  if  all  users  use  R  as  their  recovery  algorithm. 


Proof:  Condition  on  k  and  the  outcome  of  all  coin-flips  used  by  T  and  R.  Then,  there  are 
exactly  pn[ possible  executions,  each  of  equal  probability,  determined  by  the  pn  choices  for  the 
di  and  the  (^)  choices  for  the  ki.  For  each  i ,  we  will  show  that  the  number  of  these  executions  in 
which  R(k,  ki,  C )  =  di  is  small. 

For  each  degree-(n  —  1)  polynomial  /,  define  f*  to  be  the  function  mapping  each  k  in  Zp  to 
R(k,  k,T{k,  /)).  Note  that  f*  is  deterministic  given  that  we  are  conditioning  on  k  and  all  coin-flips 
in  T  and  R.  Define  Cf,  the  correct  inputs  for  /,  to  be  the  set  of  keys  k  for  which  f(k)  =  f*{k). 

The  adversary  produces  a  correct  output  only  if  at  least  one  of  the  n  user  keys  appears  in  Cf. 
For  a  given  /,  the  probability  that  none  of  the  keys  appear  in  Cf  is 


ip-  \cf\~nY 


> 

-(*- 
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pn 

\Cf\+n 

p 

n(\Cf  \  +  n 


P 


2 

and  so  the  probability  that  at  least  one  key  appears  in  Cf  is  at  most  j;\Cf\  +  Averaging  over 
all  /  then  gives 

2 

Pr  [f*(ki)  =  di  for  at  least  one  i]  < - 1 — |C/|.  (4) 

We  will  now  use  the  bound  on  the  number  of  distinct  f*  to  show  that  Y2f\Cf\  is  small. 
Consider  the  set  of  all  polynomials  f\ ,  fi , . . .  fm  that  map  to  a  single  function  f* ,  and  their 
corresponding  sets  of  correct  keys  Cf1,Cf2,...Cfm.  Because  any  two  degree  (n  —  1)  polynomials 
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are  equal  if  they  are  equal  on  any  n  elements  of  Zp,  each  n-element  subset  of  Zp  can  be  contained 
in  at  most  one  of  the  Cfi.  On  the  other  hand,  each  Cfi  contains  exactly  subsets  of  size  n. 

Since  there  only  subsets  of  size  n  to  partition  between  the  C ft .  we  have 


E 


I  Ch 

n 


< 


and  summing  over  all  M  choices  of  /*  then  gives 

Vf\ 


E 

/ 


n 


<  M 


We  now  wish  to  bound  the  maximum  possible  value  of  \Cf\  given  this  constraint. 
Observe  that  (^)  >  when  \Cf\  >  n,  from  which  it  follows  that 


E  (iqi -»)”<ni  E 

f-\Cf\>n  f 


\Cf\ 


n 


<  n\M 


(5) 


Now,  (\Cf\  —  n)n  is  a  convex  function  of  \Cf\,  so  the  left-hand  side  is  minimized  for  fixed  \Cf\ 
by  setting  all  \Cf\  equal.  It  follows  that  \Cf  \  is  maximized  for  fixed  Ylf-\Cf\>n  when  all  \Cf  \  are 
equal. 

Setting  each  \Cf\  =  c  and  summing  over  all  pn  values  of  /,  we  get 


from  which  it  follows  that 


pn{c  —  n)n  <  n\M 

l/n 


c  <  -  (  n\M 
P 


+  n, 


and  thus  that 


^2\Cf  \  <  pnc  <  p 


n—  1 


n\M 


l/n 


+  np 


Plugging  this  bound  back  into  (4)  then  gives 


Pr  [f/  0n]  =  Pr  [f*(ki)  =  di  for  at  least  one  i] 


<  — —  +  ~2  (  n\M 

P  PZ 


l/n 


p  pz 

2  n2  +  nAf  1/n 
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Using  Theorem  17,  it  is  not  hard  to  compute  a  limit  on  how  much  information  the  tamperer 
can  remove  before  recovering  any  of  the  data  becomes  impossible: 
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Corollary  18  Let  ( I,E,R )  and  (. I,T,R )  6e  as  m  Theorem  17.  Let  e  >  0  and  let  p  >  If  for 
any  fixed  k,  T  destroys  at  least  nlg(n/e)  +  1  bits  of  entropy,  then 

Prff  =  0nl  >  1  -  e. 

A  L 


Proof:  Let  e'  =  e  /  (<^7  +  2  1//n).  If  T  destroys  at  least  nlg(n/e')  +  1  bits  of  entropy,  then  we 


have 


M  <  pn  •  2~(nl&(n/e')+1)  =  V(n/e')-n  =  J  (pe'/n)’ 


Plug  this  into  (3)  to  get: 

Prfsome  d,  is  recovered]  < 
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4  n3/e' 
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=  e. 


We  thus  have: 

Pr[r  =  0n]  =  1  —  Pr[some  di  is  recovered]  >  1  —  e. 

I 

6  Related  work 

The  problem  of  building  reliable  storage  using  untrusted  servers  has  been  studied  for  a  long  time. 
Previous  work  has  offered  several  interpretations  of  what  this  means.  We  distinguish  between 
three  basic  approaches:  replication,  tamper  detection,  and  entanglement.  We  also  discuss  the 
all-or-nothing  transform,  an  unkeyed  cryptographic  tool  that  provides  guarantees  similar  to  all-or- 
nothing  integrity  in  a  restricted  version  of  the  destructive  tampering  model. 

6.1  Systems  based  on  replication 

Anderson,  in  his  seminal  paper  describing  an  “Eternity  Service”  [1],  was  among  the  first  to  propose 
building  a  network  of  tamper-resistant  servers,  spread  around  the  world.  Stored  documents  would 
be  redundantly  replicated  across  the  network,  thereby  making  them  censorship-resistant:  diffi¬ 
cult  to  delete  without  the  cooperation  of  all  servers.  Many  subsequent  storage  systems  [5,10,18] 
followed  in  Anderson’s  footsteps  and  relied  on  replication  to  protect  the  stored  data.  One  such 
system,  Free  Haven  [6],  uses  Rabin’s  information  dispersal  algorithm  (IDA)  [19]  to  break  the  doc¬ 
ument  into  n  shares,  any  k  of  which  are  sufficient  to  reconstruct  the  document,  which  are  then 
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distributed  to  a  community  of  servers.  Publius  [27]  uses  a  similar  idea  and  splits  the  key  used  to 
encrypt  documents  into  shares  that  are  distributed  to  different  servers.  BFS  [5]  (an  abbreviation  of 
“Byzantine  File  System”)  also  uses  replication  to  develop  a  storage  system  that  tolerates  Byzantine 
faults  of  up  to  1/3  of  the  servers. 

All  these  systems  solve  a  problem  that  is  different  from,  and  complementary  to,  the  one  consid¬ 
ered  in  this  report.  We  assume  that  the  users  treat  the  storage  service  as  a  monolithic  entity  and 
cannot  distinguish  between  good  and  bad  servers  within  a  larger  unified  service.  Our  reasons  for 
making  this  assumption  are  two-fold.  First,  in  most  of  the  systems  described  above,  new  servers 
can  join  freely,  which  allows  a  resourceful  and  committed  adversary  to  seize  control  by  sending  in  a 
few  thousand  of  his  best  and  closest  friends.  Second,  we  are  interested  in  guarantees  to  individual 
users  from  the  system  as  a  whole ,  without  regard  to  the  underlying  implementation.  Our  definition 
of  all-or-nothing  integrity  applies  just  as  well  to  distributed  storage  mechanisms  as  to  centralized 
ones  and  reflects  the  concerns  of  users  who  care  more  about  whether  they  will  get  their  data  back 
than  how.  Thus,  even  in  a  system  that  promises  data  availability  through  replication,  all-or-nothing 
integrity  provides  additional  assurance  to  the  users  by  guaranteeing  that  any  failure  to  fulfill  this 
promise  will  carry  a  very  high  cost. 

6.2  Systems  based  on  tamper  detection 

A  second  approach  to  providing  secure  storage  is  based  on  detecting  tampering.  As  long  as  trusted 
users  can  detect  invalid  modifications  made  to  their  data,  the  stored  data  is  considered  safe.  Two 
common  tools  used  for  tamper  detection  are  digital  signatures  and  Merkle  hash  trees  [17].  Merkle 
tree  stores  the  actual  data  at  the  leaves  of  the  tree,  and  at  each  intermediate  node  is  the  hash  of  the 
concatenation  of  the  data  at  the  previous  level.  TDB  [13],  for  example,  leverages  a  small  amount  of 
trusted  memory  to  store  a  collision-resistant  hash  of  the  entire  database.  Another  storage  system, 
SUNDR  [14, 15]  uses  Merkle  trees  to  ensure  that  all  users  have  a  consistent  view  of  the  shared  data 
-  if  the  server  delays  one  user  from  seeing  a  change  by  another,  the  two  users  would  never  again 
see  each  other’s  changes.  Other  storage  systems,  such  as  BFS  [5],  NASD  [8],  S4  [24],  SiRiUS  [9], 
and  SFSRO  [7],  have  also  relied  on  hash  trees  and  signatures  for  tamper  detection. 

The  guarantee  provided  by  these  systems  is  rather  weak.  A  user  whose  data  is  lost  is  likely  to 
notice  with  or  without  being  notified  by  the  system.  However,  as  we  saw  in  Section  5.1,  tamper 
detection  can  be  leveraged  in  to  give  all-or-nothing  integrity  if  all  users  run  standard  recovery 
algorithms  by  the  simple  expedient  of  having  all  users  politely  refuse  to  recover  their  data  if  the 
store  is  tampered  with.  That  some  users  might  insist  on  recovering  their  uncorrupted  data  anyway 
points  out  a  fundamental  limitation  of  both  the  standard-recovery-algorithm  model  and  the  tamper 
detection  approach. 

6.3  Systems  based  on  entanglement 

To  prevent  impolite  users  from  recovering  their  own  data  even  if  other  users’  data  have  been  lost, 
two  storage  systems  have  been  proposed  that  create  dependencies  between  blocks  of  data  belonging 
to  different  users:  Dagster  [25]  and  Tangier  [16].  Because  of  their  close  connection  to  our  work,  we 
described  these  systems  and  gave  rigorous  analysis  of  their  security  properties  in  Section  2. 
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Destructive  Tamperer 

Arbitrary  Tamperer 

Standard  Recovery 

all-or-nothing 

all-or-nothing 

Public  Recovery 

all-or-nothing 

symmetric  recovery 

Private  Recovery 

all-or-nothing 

— 

Table  1:  Summary  of  results.  “All-or-nothing”  means  that  all-or-nothing  integrity  can  be  achieved 
in  this  model;  “symmetric  recovery”  means  that  all-nothing  integrity  cannot  be  achieved,  but 
symmetric  recovery  can;  “ — ”  means  that  no  guarantees  are  possible. 

6.4  All-or-nothing  transforms 

Motivated  by  security  problems  in  block  ciphers,  Rivest  [20]  proposed  a  cryptographic  primitive 
called  all-or-nothing  transform  (AONT).  An  /-AONT  is  an  efficiently  computable  transforma¬ 
tion  that  satisfies  two  conditions: 

•  Its  inverse  transformation  is  also  efficiently  computable. 

•  If  l  bits  of  an  image  are  lost,  then  it  is  infeasible  to  obtain  any  information  about  the  preimage. 

Papers  such  as  [4, 23]  further  examined  Rivest’s  AONT.  AONT  is  similar  to  our  notion  of  all- 
or-nothing  integrity  in  the  sense  that  either  all  bits  of  the  preimage  can  be  recovered  (if  the  image 
is  available)  or  none  can  be  (if  l  bits  of  the  image  are  lost).  However,  AONT  is  radically  different 
from  all-or-nothing  integrity  because  it  does  not  involve  multiple  users,  who  possess  individual  keys. 
Moreover,  AONT  does  not  consider  the  possibility  that  the  image  may  be  corrupted  in  other  ways 
than  some  bits  being  deleted,  such  as  the  adversary  superencrypting  the  entire  data  store. 

7  Conclusion  and  future  work 

Entangling  documents  of  different  users  is  a  promising  idea  for  strengthening  the  integrity  of  indi¬ 
vidual  users’  data.  However,  existing  systems  such  as  Dagster  and  Tangier  only  have  an  intuitive 
notion  of  entanglement  that  is  insufficient  by  itself  to  provide  the  supposedly  increased  security.  In 
this  paper,  we  rigorously  analyze  the  probability  of  destroying  one  document  without  affecting  any 
other  documents  in  these  systems.  Our  analysis  shows  that  the  security  they  provide  is  not  strong, 
even  if  we  limit  the  class  of  attacks  permitted  against  the  entangled  data. 

Motivated  by  the  desire  to  improve  the  security  provided  by  entanglement,  we  defined  the 
stronger  notion  of  document  dependency,  in  which  destroying  some  document  is  guaranteed  to 
destroy  specific  other  documents,  and  all-or-nothing  integrity,  in  which  destroying  some  document 
is  guaranteed  to  destroy  all  other  documents.  We  considered  a  variety  of  potential  attacks  and 
showed  for  each  what  level  of  security  was  possible.  These  results  are  summarized  in  Table  7;  they 
show  that  it  is  possible  in  principle  to  achieve  all-or-nothing  integrity  with  only  limited  restrictions 
on  the  adversary. 

Whether  it  is  possible  in  practice  is  a  different  question.  Our  model  abstracts  away  most  of  the 
details  of  the  storage  and  recovery  processes,  which  hides  undesirable  features  of  our  algorithms 
such  as  the  need  to  process  all  data  being  stored  simultaneously  or  the  need  to  read  every  bit  of 
the  data  store  to  recover  any  data  item.  Some  of  these  undesirable  features  could  be  removed 
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with  a  more  sophisticated  model,  such  as  a  round-based  model  that  treats  data  as  arriving  over 
time,  allowing  combining  algorithms  that  require  using  less  of  the  data  store  for  each  storage  or 
retrieval  operation  at  the  cost  of  making  fewer  documents  depend  on  each  other.  The  resulting 
system  might  look  like  a  variant  of  Dagster  or  Tangier  with  stronger  mechanisms  for  entanglement. 
But  such  a  model  might  permit  more  dangerous  attacks  if  the  adversary  is  allowed  to  tamper  with 
data  during  storage.  Finding  the  right  balance  between  providing  useful  guarantees  and  modeling 
realistic  attacks  will  be  necessary.  We  believe  that  we  have  made  a  first  step  towards  this  goal  in 
the  present  work,  but  much  still  remains  to  be  done. 
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